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Derailments of cargo have frequently occurred in Malaysian train services 
during the last decade. Many factors contribute to this incident, 
especially its total amount of carried weight. It is found that severe 
derailments cause damage to both lives and properties every year. 
If the amount of carried weight of cargo train could be accurately forecasted 
in advance, then its detrimental effect could be greatly minimized. 
This paper presents the application of Artificial Neural Network (ANN) to 
predict the amount of carried weight of cargo train, with KTMB used as the 
study case. As there are many types of cargo being carried by KTMB, 
this study focuses only on cement that being carried in twelve (12) 
different routes. In this study, Artificial Neural Network (ANN) has been 
incorporated for developing a predictive model with three (3) different 
training algorithms, Levenberg-Marquardt (LM), Quick Propagation (QP) 
and Conjugate Gradient Descent (CGD). The best training algorithm is 


selected to predict the amount of carried weight by comparing the error 
measures of all the training algorithm which are Root Mean Squared Error 
(RMSE) and Mean Absolute Percentage Error (MAPE). The obtained results 
indicated that the ANN technique is suitable for predicting the amount of 
carried weight. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 


S. Sarifah Radiah Shariff, 
Malaysia Institute of Transport, 
Universiti Teknologi MARA, 
Shah Alam, Selangor, Malaysia. 
Email: shari990 @uitm.edu.my 


1. INTRODUCTION 

Cargo or freight refers to goods or product that are transferred of distributed generally for 
commercial gain. Nowadays, cargo transport can be carried on water, air or land. Most widely used to carry 
cargo is road transport. Different form of weight and vehicle are used to transport cargo around. 
Road transport has many advantages like it can do door to door delivery on top of having several type of 
vehicles like trucks, busses, lorry, cars and so on. However, some bulky items like sugar, cements, charcoals 
that need to be transferred in large volume are moved using train or rail transport. 

Other than known as able to carry passengers, train is also capable of transporting large volume of 
items such as water, cement, steel, wood and coal. Generally, train cargo has a direct route to its destination. 
Under the right condition, cargo transport by rail is more economic and more productive compared to road 
transport, especially when transporting items in large volume over long distance. The choice of mode of 
transportation depends very much on carried weight. Carried weight is an important matter in transport 
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system and need to be considered. In the logistic transportation system, the amount of weight carried is very 
important to ensure that all the goods arrive safely at the destination in time. Using train as mode of 
transportation is beneficial to the environment as it is limiting greenhouse gas emissions, increasing fuel 
efficiency and reducing its carbon footprint [1]. 

KTMB Freight Service has three types of train services: Train Contena Service, Train Cargo 
Conventional Service and Train Landbridge Service. In 2017, KTMB has experienced three major 
derailments. On August 21, 2017, a cargo train crashed at Jalan Kucing, causing delays for a few days [2]. 
On September 23, 2017, KTMB’s cargo train snapped electrical cable between Rawang and Kuang stations 
and forcing KTMB to close all tracks for two days [3]. On November 23, 2017, once again another cargo 
train accidents occurred when twelve cargo trains travelling southward between the National Bank Station 
and Kuala Lumpur Station slipped due to heavy weight and oversized loads carried by the cargo trains. 
As a result, KTM and ETS services were disrupted on several routes around the Klang Valley. One of the 
major causes of this tragedy is the overloading of the cargo train’s wagon [4]. In recent accident that occurred 
on 21 July 2019 cargo train that carried 30 wagons of cement. During the derailment, KTMB needed to 
relocate all the wagons as soon as possible because all the KTMB’s services were effected [5]. 
The derailment happened due to many factors and one of the most significant factors is the amount 
of carried weight. Having the amount of carried weight planned to match the track capability can avoid 
derailment occurrences. Artificial Neural Network (ANN) is a popular method used by other previous 
researchers to predict carried weight. In this study, the cargo train carried weight will be predicted. 

The previous research outcomes demonstrated that the ANN is an efficient option 
strategy in prediction [6-10]. This is supported by [11-12] who proposed that ANN is the best model 
compared to Adaptive Neuro-Fuzzy Inference System (ANFIS). This study compared the models with 
American Concrete Institute and Iranian Concrete Institute empirical codes. As a result, the prediction of 
ANN is better than ANFIS model. In [13] developed a decision support system that can forecast demand in 
electronic retails industry at Turkey by using ANN techniques such as Gradient Descent (GD), the Conjugate 
Gradient Descent (GCD), Quick Propagation (QP) and LM methods. However, in multi — stage supply — 
chain area the application of these artificial technique still have severe lack. 

There are more studies focusing on how the predictive ability can be influenced by the training 
and testing algorithm. According to [14] in their study, ANN is used to predict carried weight and 
three (3) classes of ANN are used which are incremental back propagation algorithm (IBP), 
Genetic algorithm (GA) and Levenberg - Marquardt algorithm (LM). The predicting performance of the three 
algorithm was compared. This study was applied in an automobile industry, Iran Khodro Company (IKCO) 
as to appropriately provide the machinery resources, labor and transport system demand. ANN was 
used to test the weekly data of carried weight based on the observation of the number of vehicles 
and fuel consumption. At the end of the study, IBP give the optimum training algorithm. 
As for improvement, [15] used the same variable as the previous research to predict the carried weight. 
Instead of using GA and IBP, Quick Propagation (QP) and Batch Back propagation (BBP) are used and QP 
exhibits the better performance. Hence, this paper presents the application of Artificial Neural Network 
(ANN) to predict the amount of carried of cargo train, using three training algorithms: 
Levenberg - Marquardt algorithm (LM) as a well performed algorithm to predict different set of carried 
weight data, Conjugate Gradient Descent (GCD) as a well perform algorithm for prediction of other sets of 
data and Quick Propagation (QP) as a new algorithm used to predict carried weight. 


2. RESEARCH METHODS 

ANN is a mathematical model or computational model based on the neural networks or called 
an imitation of biological neural system. It is an adaptive system as it could modify the structure based 
on the information either internal or external that flow through the network [16]. This model is a flexible 
computing framework and a universal approximator. It can be applied to a wide range of problem like 
a time series forecasting with a high degree of accuracy. ANN replicates the biological neuron structure by 
creating a simple processing unit called artificial neurons. An approximation of the 3-dimensional 
intercoonectedness of biological neurones is done in ANN by means of the usage of layers. Figure | shows 
an ANN with input nodes, hidden nodes, and one output node. The hidden nodes will be generated using the 
different built-in algorithms. 
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Figure 1. Artificial neural network model (ANN) 


2.1. Training algorithm 

Three built in training algorithms are used and compared. 
a. | Levenberg-Marquardt (LM) 

It is a higher-order adaptive algorithm and it minimizes the Mean Square Error of a 
neural network [17]. LM algorithm is a variation of Newton’s method that is designed for minimizing 
functions that are sums of squares of other nonlinear functions. LM algorithm provides numerical solution to 
minimized non-linear function. The (non-negative) damping parameter is adjusted in every iteration, where 
small values of the algorithmic parameter \ result in Gauss-Newton update, and large values of A result in a 
gradient descent update. The parameter A is initialized to be large so that first updates are small steps in the 
steepest descent direction. If any iteration happens to lead to a poor approximation, then A is increased. 
Therefore, for large values of A, the step will be taken approximately in the direction of the gradient. 
Otherwise, as the solution improves, 4 is decreased, the LM method approaches the Gauss-Newton method, 
and the solution typically accelerates to the local minimum. 

b. Conjugate Gradient Descent (CGD) 

The CGD method solves systems of linear equations, also used to solve system where matrix is not 
symmetric, not positive-definite, and still not square [18]. CGD is an advanced method for training multi- 
layer neural network. In the CGD method, the line is not searched, but a plane is searched. A plane is 
formulated from a random linear combination of two vectors. For minimizing quadratic functions, the plane 
search requires only the solution of a two by two sets of linear equation for a and B. Solving convex 
optimization problems using CGD. 


a 2 tae get 
f(x) = 5x? + Ly? + ixy (1) 


Gradient Descent Method will try to find the minimum by computing the gradient of (f) at the initial guess. 
To achieve the value of x close to optimal solution the whole process has to iterate. 
c. Quick Propagation (QP) 

The Quick Propagation method uses the following updating equation: 


Bizy = By + up] (2) 
Where, 

uj = (Ay; — Bypi)/Pi Pi (3) 

AYi = Visr -Vi (4) 


y; is the model response for the ith iteration. The approximation of the Jacobian matrix B;,, for the (@ + 1)th 
iteration is calculated using the Jacobian matrix approximation B;, the parameter perturbation vector p; and 
the change in the model response A y; for the ith iteration. The updating matrix ujp/ is a rankone matrix and 
Broyden's method is a rank-one quick propagation method. The algorithm classified to the group of the 
second order learning method which is it follows a quadratic approximation of the previous gradient step and 
the current gradient [19]. 
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2.2. Error measures 
According to [20], forecasting error is about measuring how good the performance of a model itself 
compares to the one of using the past data. 
a. Root Mean Squared Error (RMSE) 
b. Mean Absolute Percentage Error (MAPE) 


2.3. Model validation 

The first stage is called initial data preparation. During the first stage, the data series will divided 
into two parts. The first part known as within samples or fitting parts that used to estimate the performance of 
forecasting model [21]. Meanwhile, the second part is to evaluate the model called as out samples or 
evaluation part. In this study, the data are partitioned into 70% for training part where as the 30% for 
validation part. There are 13,152 observation. 

In the second stage, the within sample statistics is used to estimate the model using three built in 
algorithms, LM, CGD and QP. The best estimation approach is selected based on the outcomes of comparing 
their error measures performances [22]. For this purpose, RMSE and MAPE are used [23-24]. 
Training algorithm with the smallest error measure is decided to be able to produce the best fit model. 

Having completed the first and second stages, the last stage is to use the best fit model to forecast 
the amount of carried weight by each train per trip, that can help KTMB to plan for its future operation. 


3. RESULTS AND DISCUSSION 

Predictive modeling using Artificial Neural Network were carried out by using Alyuda 
Neurointelligence software. In the first stage, data is treated for its missing values. Initially, 
there were 12 routes. Since, the missing values for some routes are more than 15% [25], then, those routes 
are omitted. The remaining two routes which are Route | and Route 2 are further analyzed and underwent 
imputation process by using IBM SPSS Modeler 18.0 software. 

Table 1 shows the summary statistic for variables in Route | and Route 2 before imputation. 
There are three continuous and four categorical variables respectively. Only 1 categorical variable has 
missing value which is Labor in Route 1 and Route 2. While, there are missing values for all continuous 
variable which are Total Wagon, Tonnage/KM and Carried Weight for both routes. Therefore, imputation are 
needed for Labor and Total Wagon. However, Tonnage/KM will not undergo imputation process. 
For target variable which is Carried Weight, all the cases with the missing value are discarded for 
both routes. 


Table 1. Summary statistics for variables 


Station Route 1 Route 2 
Variable Type Valid Mising Valid Missing 
Train no Categorical 1096 0 1096 0 
Company/Customer Categorical 1096 0 1096 0 
Distance Categorical 1096 0 1096 0 
Labour Categorical 922 174 844 252 
Total Wagon Continuous 905 191 900 196 
Tonnage/KM Continuous 1009 87 934 162 
Carried Weight Continuous 1009 87 934 162 


3.1. Designing the network 

In order to choose the best training algorithm, the best network architecture is defined first. 
For Route 1 shown in Table 2, there were 8 iterations in finding the best network architecture. 
However, 6 network architectures which is in red colored has been removed in order to avoid over fit 
problem when the number of hidden nodes is greater than the number of input nodes. From the results, 
it was found that the best architecture is [5-5-1] model since it gives the largest fitness value, lowest test error 
and lowest AIC. Table 3 also shows that the best architecture for Route 2 is also [5-5-1] model. 
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Table 2. Architecture network of ANN at route 1 


ID Architecture Fitness Test Error AIC Correlation R-Squared 
1 [5-1-1] 0.0859 11.6439 -2825.6098 0.9941 0.9882 
2 [5-13-1] 0.2371 4.2173 -3918.6457 0.9997 0.9994 
3 [5-8-1] 0.2057 4.8626 -3635.6821 0.9994 0.9988 
4 [5-5-1] 0.2043 4.8949 -3571.4968 0.9987 0.9974 
5 [5-11-1] 0.1751 5.7097 -3551.4642 0.9992 0.9984 
6 [5-9-1] 0.1538 6.5036 -3515.9084 0.9992 0.9984 
7 [5-6-1] 0.1716 5.8276 -3633.2912 0.9992 0.9984 
8 [5-7-1] 0.1828 5.4696 -3655.1763 0.9992 0.9984 
Table 3. Architecture network of ANN at route 2 
ID Architecture Fitness Test Error AIC Correlation R-Squared 
1 [5-11-1] 0.6865 1.4566 -3896.9178 0.9998 0.9996 
2 [5-9-1] 0.6262 1.5969 -3866.8793 0.9997 0.9994 
3 [5-12-1] 0.6085 1.6435 -3794.2906 0.9997 0.9994 
4 [5-10-1] 0.5914 1.6908 -3855.7806 0.9997 0.9994 
5 [5-13-1] 0.5288 1.8909 -3739.4992 0.9997 0.9994 
6 [5-8-1] 0.4002 2.4989 -3645.9149 0.9994 0.9988 
7 [5-5-1] 0.1217 8.2164 -2886.5162 0.9971 0.9942 
8 [5-1-1] 0.1117 8.9540 -2897.0075 0.9955 0.9910 


The fitness of training algorithms is also done in which Table 4 shows that LM _ produces 
the smallest value of Absolute and Network Error for both Route | and Route 2. Table 4 also shows 
that LM produces the smallest error value (RMSE and MAPE) for both training and validation parts for 
Route | and Route 2. 


Table 4. Comparison of error measures for training algorithms 


Route 1 
Training Training Validation 
Algorithm RMSE MAPE Absolute Error | Network Error RMSE MAPE Absolute Error Network Error 
CGD 19.805 382.506 0.528 0.001 22.884 523.664 0.389 0 
LM 4.323 18.222 0.341 0 5.383 28.98 0.32 0 
QP 19.564 373.287 4.539 0 22.706 515.541 3.917 0 
Route 2 
Training Training Validation 
Algorithm RMSE MAPE Absolute Error — Network Error | RMSE MAPE Absolute Error Network Error 
CGD 5.532 29.843 4.5 0 57.294 3282.55 2.596 0 
LM 4.807 22.533 0.669 0 24.43 596.835 0.505 0 
QP 5.587 30.445 2.66 0 24.642 607.226 4.8 0 


3.2. Forecasting by using the best training algorithm 

As previously discussed, the best training algorithm will be used for prediction of carried weight. 
Hence, the ANN model with LM as the training algorithm is used to predict in both routes, 
Route | and Route 2. 

The amount of carried weight forecasted for year 2019 at Route 1 is illustrated in Figure 2. 
The grey line represents the forecast value and the dotted orange line represents the trend line of the new 
forecasted values which negative slope indicates that amount of carried weight for Route | slightly decrease 
and going to decline over time. The trend line of the new forecasted carried weight values is constructed and 
it can be concluded that there is a decrease in amount of total tonnage carried each day by 0.0489 this due to 
negative relationship. The equation is y = -0.0489x + 3249.8. The forecast value shows that amount of 
carried weight fluctuates over time and decrease by 48.9 kg per day. 

Then, the amount of carried weight forecast for year 2019 at Route 2 was illustrated in Figure 3. 
The grey line represents the forecast value and the dotted orange line represents the trend line of the new 
forecasted value. The forecasted line for Route 2 also is having a negative slope indicating that amount of 
carried weight for Route 2 slightly decreases over time. The trend equation is calculated, y = -0.1186x + 
6079.3 which shows a decrease by 118.6 kg per day. Comparing the trend line of the new forecasted carried 
weight with earlier trend in Section 4.5.2, it can be seen that the decrease of average amount of carried 
weight for Route | slightly changes from 69.1 kg per day to only 48.9 kg per day. Therefore, it can be 
concluded that the amount of carried weight of cargo is increasing and cargo business is improving. 


Int J Artif Intell, Vol. 9, No. 3, September 2020: 480 — 487 


Int J Artif Intell ISSN: 2252-8938 o 485 


On the other hand, for Route 2, the decrease of average amount of carried weight changes from 117.8 kg per 
day to 118.6 kg per day. However, the change is only 0.68% and very minimal. 


Amount of Carried Weight at Route 1 


y = -0.0489x + 3249.8 


1/1/2016 1/1/2017 1/1/2018 1/1/2019 
Date 


Predicted Forecast Linear (Predicted) 


Figure 2. Amount of carried weight forecast for route | 


Amount of Carried Weight at Route 2 
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Figure 3. Amount of carried weight forecast for route 2 


4. CONCLUSION 

This paper presents the application of Artificial Neural Network (ANN) to predict the amount 
of carried of cargo train, using three training algorithms: Levenberg - Marquardt algorithm (LM) 
as a well performed algorithm to predict different set of carried weight data, Conjugate Gradient Descent 
(GCD) as a well perform algorithm for prediction of other sets of data and Quick Propagation (QP) 
as a new algorithm used to predict carried weight. The achieved results show the appropriateness of the 
Artificial Neural Network in predicting the amount of carried weight based on the correlation, fitness and test 
error values. Based on the RMSE and MAPE, LM shows the smallest values for Route 1 and Route 2 that 
carry cement cargos for KTMB customers. 

Furthermore, the ANN model based on the best training algorithm found in the first phase of the 
study is used to forecast value of carried weight of cargo train for both routes in rail transportation system. 
Results show that the values of carried weight fluctuate and decline overtime for year 2019 (365 days ahead). 
It is hope that the results can help KTMB to plan the right amount to be carried by its cargo per trip in its 
effort to prevent form more derailments occurrence. At the same time, as the amount of carried weight is 
predicted to decline over time, KTMB can plan a strategic initiative in getting more customers while 
monitoring the right amount to carry each trip. 
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